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Abstract 

Background: Accurately predicting low energy barrier folding pathways between conformational secondary 
structures of an RNA molecule can provide valuable information for understanding its catalytic and regulatory 
functions. Most existing heuristic algorithms guide the construction of folding pathways by free energies of 
intermediate structures in the next move during the folding. However due to the size and ruggedness of RNA 
energy landscape, energy-guided search can become trapped in local optima. 

Results: In this paper, we propose an algorithm that guides the construction of folding pathways through the 
formation and destruction of RNA stacks. Guiding the construction of folding pathways by coarse grained 
movements of RNA stacks can help reduce the search space and make it easier to jump out of local optima. 
RNAEAPath is able to find lower energy barrier folding pathways between secondary structures of conformational 
switches and outperforms the existing heuristic algorithms in most test cases. 

Conclusions: RNAEAPath provides an alternate approach for predicting low-barrier folding pathways between RNA 
conformational secondary structures. The source code of RNAEAPath and the test data sets are available at http:// 
genome.ucf.edu/RNAEAPath. 



Introduction 

RNA molecules play critical roles in the cell. The second- 
ary structures of RNA molecules have been extensively 
studied because they provide insights into the functional- 
ity of RNAs. Native (functional) RNA secondary struc- 
tures are usually thermodynamically stable and many of 
them are also the minimum free energy (MFE) structures. 
Nevertheless, at times, RNA molecules may fold into 
alternative secondary structures in order to participate in 
certain biological processes. For example, the SV-11 
RNA folds into a metastable conformational structure 
and acts as a template for its own replication using QJ3 
replicase [1,2]. Further, RNA conformational switches 
can transform between alternative secondary structures 
dynamically in response to various environmental stimuli 
(such as heat shock and cold shock) [3-6], and carry out 
RNA-mediated biological activities, such as switching on 
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or off downstream gene translation activities [7-9], regu- 
lating RNA splicing via multiple-state splicesomal con- 
formations [10], and regulating the life cycles of virus 
[11]. 

The conformational transformations between alterna- 
tive structures involve the folding of an RNA molecule 
into a series of sequential adjacent intermediate struc- 
tures [12]. RNA folding pathways provide valuable infor- 
mation for understanding the catalytic and regulatory 
functions of RNAs (such as hok/sok of plasmid Rl [13]). 
RNA folding pathways may also impact subsequence bio- 
logical events (such as formation of tertiary structures). 
Furthermore, prediction algorithms can help the design 
of RNA switches by providing prescribed structural 
alternatives. 

In this paper, we present a new approach, RNAEAPath, 
for computing near optimal direct or indirect folding 
pathways between two secondary structures of an RNA 
molecule. We guide the search for low energy barrier 
folding pathways by integrating a variety of strategies for 
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simulating the formation and destruction of RNA stacks 
in a flexible framework. Benchmark tests on conforma- 
tional switches show that RNAEAPath produces lower 
energy barrier folding pathways and outperforms the 
existing heuristic approaches in most test cases. 

Preliminary 

Consider an RNA sequence as a string x = Xi ... x„ of n let- 
ters over alphabet I = {A, U, G, Q. A pair of complemen- 
tary nucleotides x t and Xp can form hydrogen bonds and 
interact with each other, denoted by x t ■ Xp In this paper, 
we only consider the canonical base pairings {A ■ U and 
G • C) and the wobble base pairing (G • U). A secondary 
structure S of the RNA sequence x is a set of disjoint 
paired bases (i, /'), where 1 < i <j < n.S may be represented 
by a length n string of dots and brackets, where dots 
represent unpaired bases and brackets represent paired 
bases. An RNA structure can comprise of stacks which are 
lists of consecutive base pairs ({(/,/), (Z + 1,/ - 1), ... , (Z + 
w, j - w)} such that x t • Xp . . . , x i+w ■ Xp w ), and unstacking 
base pairs. A secondary structure is pseudoknotted if it 
contains two base pairs (z, /') and (/', /) with i <i' </' </. In 
this paper, we only consider pseudoknot-free structures. A 
base pair is compatible with a secondary structure if the 
base pair can be added to the structure without leading to 
a pseudoknotted structure or pairing a base with more 
than one partner. A stack is compatible with S if each base 
pair in the stack is either in S or is compatible with S. 

The free energy of a secondary structure S is denoted 
by E(S). The set of neighboring structures of S consists 
of all structures that differ from S by an addition or 
deletion of exactly one base pair. For two secondary 
structures A and B, the distance between A and B is the 
number of base pairs in A not in B plus the number of 
base pairs in B not in A (i.e. \(A - B) U (B - A)\). A fold- 
ing pathway from A to B is a sequence of intermediate 
structures A = S 0 , . . . , S m = B such that for all 0 < i 
<m, intermediate structure S i+1 is a neighboring struc- 
ture of Sj. A folding pathway is direct if the intermediate 
structures contain only base pairs in A and B (i.e. 5, £ 
A U B for 1 < Z <m) and otherwise is indirect. The sad- 
dle point of a pathway is an intermediate structure with 
the highest energy, and the energy barrier of a pathway 
is the energy difference between its saddle point and the 
initial structure. Since the folding of RNA structures is 
thermodynamically-driven and tends to avoid high- 
energy intermediate structures, current computational 
methods aim to find RNA folding pathways with the 
lowest energy barriers. 

Previous studies 

A lot of research has been done on predicting low 
energy barrier folding pathways. Morgan and Higgs pro- 
posed a greedy algorithm that employs the Nussinov 



model [14,15] for computing direct folding pathways 
with minimum energy barrier. They also described a 
heuristic that samples low energy structures from the 
partition function and glues them together by direct 
pathways [16]. The Nussinov model is simple and easy 
to implement, in which base stacking and loop entropies 
have no energetic contributions. Based on this model, 
Thachuk et al. [17] developed an exact algorithm, 
PathwayHunter, which exploits elegant properties of 
bipartite graphs for finding the globally optimal direct 
pathways. However, the Nussinov model is not as accu- 
rate as the Turner energy model [18,19] for approximat- 
ing RNA thermodynamics. An exact solution based on 
the Turner energy model is also available. BARRIERS 
[20,21], exactly computes the globally optimal folding 
pathways between any two locally optimal secondary 
structures. BARRIERS reads an energy sorted list of 
RNA secondary structural conformations produced by 
RNAsubopt [22] and is able to compute both direct and 
indirect low energy barrier pathways. 

Nevertheless, the above exact solutions are all expo- 
nential in time, because the problem itself is NP-hard 
[23]. Many heuristic algorithms have also been proposed 
following the seminal work of Morgan and Higgs. 
Flamm et al. [24] used breadth-first search in their 
heuristics (in Vienna RNA Package [25]) and kept the 
best k candidates at each step to bound the search. Voss 
et al. [26] devised a straightforward strategy for greedily 
searching direct pathways. Geis et al. [27] described a 
greedy heuristic to explore the search space of direct 
pathways and they also integrated look ahead techniques 
to diminish the search space. Recently, Dotu et al. [28] 
developed RNATabuPath, a fast heuristic that employs a 
TABU semi-greedy search to construct near optimal 
(both direct and indirect) folding trajectories. In addi- 
tion, other heuristic approaches, by splitting the path- 
ways into shorter pathways and solving each 
individually, have also been proposed [29,30]. There are 
also other formula presented for the prediction of RNA 
folding kinetics (see Flamm and Hofacker's review [31] 
for a systematic discussion). 

Many of the existing heuristic algorithms start from an 
initial structure A, and, at each single step i, walk from 
the intermediate structure S, to one of its neighbors S,- +1 
until finally the end structure B is reached. The definition 
of neighborhood relationships as well as the fitness func- 
tions can be different. The fitness function of 5 ; is usually 
defined on the free energy of S it or the distance from S,- 
to B, or a function of both. In general, greedy algorithms 
select the 'best' neighbor structure that has the best fit- 
ness. In contrast, semi-greedy algorithms may select any 
one from the top k structures for randomization. RNA- 
TabuPath, which is more sophisticated and outperforms 
other methods [28], keeps a tabu list for saving recently 
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taken moves such that they can not be applied in certain 
steps until being removed from the tabu list. In general, 
during the construction of a folding pathway, these heur- 
istic algorithms select the next intermediate structures 
from a set of neighboring structures that have the top 
lowest free energy or have the top shortest distance to B 
(or the combination of both). 

Motivations 

However, using energy to guide the construction of fold- 
ing pathways in the above-mentioned heuristic algo- 
rithms has its downsides. The RNA energy landscapes 
can be extremely large and rugged [11,32] and the rug- 
gedness of RNA energy landscape may cause the 
energy-guided search to become trapped in a local opti- 
mum. Similar to using structural rearrangements for 
modeling RNA folding kinetics [33], we want to con- 
struct candidate folding pathways in a manner that 
make it easier to jump out of local optima. It has been 
revealed that stacking base pairs contribute significantly 
to the stabilization of RNA secondary structures [34,35]. 
The dominant RNA folding pathways involve the forma- 
tion and destruction of the stacks, and the cooperative 
formation of a stack along with the partial melting of an 
incompatible stack [36]. In this paper, we propose to 
guide the construction of pathways by the formation 
and destruction of stacks (not by free energy or by dis- 
tance to the end structure). We still select the con- 
structed folding pathways according to their energy 
barriers. Although the construction of folding pathways 
is not driven by thermodynamics, the selection of fold- 
ing pathways is based on energy barriers. Guiding the 
construction of folding pathways by coarse grained 
movements of RNA stacks may help reduce the search 
space and makes it easier to jump out of local optima. 
In the rest of this paper, the Methods section describes 
the representation of folding pathways and the detailed 
strategies employed by RNAEAPath. The Results and 
Discussion section presents benchmarking results of 
RNAEAPath against existing methods followed by con- 
cluding remarks in the Conclusions section. 

Methods 

Representation of RNA folding pathways 

Given an initial structure A and an end structure B, we 
use a sequence of actions successively applied to A, 
rather than a sequence of intermediate structures, to 
represent a folding pathway from A to B. Representing a 
pathway by an action chain can avoid cyclic additions 
and deletions of base pairs and make it easy to simulate 
the formation and deletion of RNA stacks. A similar 
representation has also been employed in the previous 
work of Thachuk et al. [17]. 



We use two types of actions, add/y and del i; in the 
representation of RNA folding pathways. For an inter- 
mediate secondary structure S of an RNA sequence x, 
the action add ij7 denotes the 'add'ition of base pair (i, j) 
to S (i.e. add^S) = S U {{i, /')}) and del^ denotes the 
'deletion of base pair (i, j) from S (i.e. del y(S) = S - {(/, 
/)}). An action is direct if it concerns a base pair in A U 
B and indirect otherwise. The simplest direct pathways 
from A to B concern sequential deletions of all base 
pairs in A - B followed by additions of all base pairs in 
B-A. 

Consider an example sequence x = GGGGAAAA 
CCCCUUUU with initial and final structures shown in 
Figure 1. This simple pathway is obtained by first delet- 
ing all GC pairs from A until the RNA is single 
stranded, and then adding all AU pairs until B is 
obtained. Note that each intermediate structure 5, differs 
from both its successor and predecessor by exactly one 
base pair. The actions in the example are all direct 
actions and the energy barrier is 5.50 - (-6.60) = 12.10 
kcal/mol. 

An addition action addy(S) conflicts with S if either x t or 
Xj is already paired in S, and it clashes with S if there exists 
a base pair {(»:•, xj) € S\i < i' < j < f or i' < i < f < j}. 
A deletion action dely(S) conflicts with S if (*,-, xj) £ S. An 
addition or deletion action is valid and can be applied to S 
properly if it neither conflicts with nor clashes with S. 

A pathway from A to B can be represented by an action 
chain, which is a sequence of valid actions a lt . . . , a m 
such that S 0 = A, S t = a t (S t -\) for 1 < t < m and S m = B. 
Note that an action chain for A to B implies a sequence 
of valid actions that can be successively applied to A 
without introducing conflicts or clashes and produce B. 
We use the term "action chain" when the sequence is 
certified to be valid, and the term "sequence of actions" if 
its validity is not guaranteed. 

This representation of a pathway p from A to B has 
the following important properties. First, every folding 
pathway can be represented by a unique action chain 
and every action chain represents a unique folding path- 
way (note that it is not necessarily true for a sequence 
of actions). Second, rearranging the order of actions in 
p results in a new sequence of actions which represents 
a new folding pathway from A to B when it is valid. (It 
is an action chain that can be successively applied to A 
properly and obtain B.) Third, introducing a pair of 
complementary actions (e.g. add,-^ and del ,-,,-) to p 
results in a new sequence of actions which also repre- 
sents a new folding pathway from A to B if it is valid. 

In RNAEAPath, folding pathways are represented in 
the form of action chains, instead of a sequence of inter- 
mediate structures. This representation makes the life 
cycle of a folding pathway transparent to the algorithm 
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Figure 1 A simple folding pathway that converts an RNA sequence from structure A to 8. A simple folding pathway that converts an RNA 
sequence from structure A to S. The leftmost column shows a simple direct pathway from A to 6, the center column shows the free energies 
(in kcal/mol) of the intermediate structures, and the rightmost column presents the action chain 0, a s for this pathway. 



and also makes it easier for us to simulate the coopera- 
tive formation and destruction of RNA stacks by re- 
arranging the order of actions or introducing multiple 
pairs of complementary actions. 

Predicting low energy barrier folding pathways 

Given an RNA sequence x, an initial structure A and a 
final structure B, RNAEAPath computes a near optimal 
low energy barrier folding pathway from A to B in an 
evolutionary algorithm framework [37]. Figure 2 eluci- 
dates the overall paradigm for RNAEAPath. In this algo- 
rithm, the population of each generation is comprised of 
folding pathways ordered by their fitness. The functions 
My(p) are mutation strategies, each of which takes in a 
pathway p and produces a set of offspring pathways. 
These mutation strategies are central to the effectiveness 
of RNAEAPath and will be discussed in the Mutation 
strategies subsection. £ u £ 2 , £3, MAX and y are positive 
integer control parameters. 

The initial population of RNAEAPath, ? 0 , is filled 
with a set of simple pathways. Then, the algorithm 
goes through several iterations. P^-i is the population 
of the k - 1 st iteration. In the k th iteration, the algo- 
rithm produces <Dk (an ordered list of pathways) and P^ 
(the population of the k* iteration) from P^. Ofe stores 
the best l\ pathways in P^.i and the best €2 pathways 
produced by each p e P^-i- More specifically, each 
pathway p e P^i produces tj? offsprings through every 
mutation strategy M y (l < y < Y). The resulting 



offsprings produced by p are stored in a temporary list 
T, and the top £ 2 pathways are added to Finally, the 
best solution of the k th iteration, termed as OPT^, is 
the best pathway in And, P^ (the population of the 
k th iteration) is composed of the best £3 pathways of Ofe 
and will be used in the next iteration to produce P*+i. 
This helps keep the diversity of the population large, 
since contains at most £ 2 offsprings produced by 
each p e P k . u no matter how many high-qualified off- 
springs are produced by each pathway. The algorithm 
terminates when a stopping condition is met, and it 
returns the best solution of the last iteration. Since 0i> 
retains the best £\ pathways from P^.j in each iteration, 
the best one ever encountered by the algorithm is 
retained in lists Ofe and P*, and stored in OPT^. So, 
OPT^ has no worse fitness when compared to OPT^, 
and RNAEAPath always returns the best action chain 
it ever discovered. 

In the remaining of this section, we discuss details 
regarding fitness evaluation, initialization of the popula- 
tion, stopping conditions and mutation strategies of 
RNAEAPath. 
Fitness of action chains 

The order of folding pathways (valid action chains) is 
primarily determined by their energy barriers. In case of 
a tie, the order is determined by the average of energy 
differences between the initial structure A and inter- 
mediate structures. Note that lower energies are pre- 
ferred in the previous two methods of ordering. If a tie 
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Procedure: RNAEAPath(x, A, B) 
l: A <- \E(B) - E(A)\ 

2: k «- 0 

3: Initialize Po and sort individuals in it by energy barriers 

4: OPT 0 ^P 0 [1] 

5: while !STOP(/c, OPT, A) do 

6: k «- k + 1 

7: Ofc<-P fc _i[l...*i] 

8: for all p e Pfc-i do 

9= T^(U^=iM tf (p)) 

10: Ofc<-O fc UT[l...* 2 ] 
11: end for 

12: OPT A; =O fc [l] 
13: F k <-O k [l...£ 3 ] 

14: end while 
15: return OPT fc 

Figure 2 Overview of RNAEAPath. Overview of RNAEAPath. In this procedure, the input is an RNA sequence x with the start and end 
structures A and 8, and the output is the best folding pathway in k iterations (OPT/J. For notations, k is the number of iterations, P 0 is the initial 
population, and P'k is the folding pathway population of the k' h iteration. J contains all the offspring folding pathways produced by applying 
mutation strategies Mi, ... , My to each pathway p in the (k - 1) st population, ©fe is an ordered list of offspring folding pathways of the k' h 
generation, from which, the population for the next iteration ( P^ +1 ) is selected. Folding pathways in P* and Ofe are sorted based on their fitness 
and PjtD . . . £] are the top £ best folding pathways in P*. 



still exists, then shorter action chains are preferred. 
Action chains are ordered arbitrarily if their relative 
order can not be determined based on these three 
criteria. 

The initial population of folding pathways 

The initial population, P 0 , contains 4 simple pathways 
from A to B formed by first deleting all base pairs in A - 
B and then adding those in B - A, similar to the pathway 
shown in Figure 1. Although we can also arrange base 
pair deletions and additions in an arbitrary order, we tai- 
lor them in a manner that simulates successive degrada- 
tion and formation of RNA stacks. This is because 
random deletions and additions of base pairs tend to 
form additional unpaired loop regions that introduce 
entropic penalties (see Figure 3 for an illustration). We 
can degrade or form each stack either from the outmost 
base pair to the innermost base pair or vice verse. 
Usually, it yields a lower energy barrier if we degrade a 
stack from the outmost base pair to the innermost base 
pair and form a stack from the innermost base pair to the 
outmost base pair. However, for the sake of simplicity 
and generosity, we construct 4 simple pathways in P 0 , 
which degrade all the stacks from the same direction and 
form all the stacks from the same direction. These simple 



pathways constitute a diversified and unbiased initial 

population for the algorithm to start from. 

The number of offsprings produced by each mutation 

strategy 

In each generation, the expected total number of off- 
springs produced by each individual is a constant posi- 
tive integer £. The number of offsprings that each 
individual produces using mutation strategy 
M y , (1 < y < Y), in the k 01 generation, is denoted by 1^. 
In the initial generation, is equivalent to Q/Y for all 
the mutation strategies. In the k th generation, l\y is 
determined adaptively according to the quality of the 
offsprings produced using M } , in the k - 1 st iteration. Let 
b 1 ^ 1 be number of offsprings that are both produced 
through My and selected to construct P^-i, the popula- 
tion of the k - l s£ generation. Then, in the If* gen- 
eration is computed as follows. 
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Figure 3 Two different folding pathways that form an identical stack. Two different folding pathways that form an identical stack. Left: The 
stack is formed successively. Right: The stack is constructed by random formation of base pairs. The right pathway yields a higher energy barrier 
because the randomly introduced base pairs form unpaired loop regions that result in additional entropic penalties. 



Mutation strategies that have produced more high 
quality offsprings in the (k - \) st iteration are allowed to 
generate more offsprings in the k* generation. In con- 
trast, mutation strategies that perform poorly in the k - 
1 st generation, are only allowed to generate a small 
number (£ mj „, with default value 3) of offsprings. Note 
that, the sum of I for 1 < y < Y may be greater than I. 
Stopping conditions 

The algorithm terminates when (1) the current best 
solution achieves the lowest possible value \E(B) - E(A)\, 
or (2) when no improvement has been found over y 
consecutive iterations (a plateau), or (3) when MAX 
number of iterations have passed and successive itera- 
tions do not discover better results. Note that the algo- 
rithm may simulate further than MAX iterations if 
improvements are made in the very last iteration and it 
stops immediately if no improvement is made between 
successive iterations. More specifically, the algorithm 
stops when any of the following conditions is satisfied: 

1. the energy barrier of OPTV is equivalent to \E(B) - 
E(A)\. 

2. k >y and the fitness of OPT^ is equivalent to that 

Of OPTVy 

3. k > MAX and the fitness of OPTV is equivalent to 
that of OPT*.!. 

Mutation strategies 

In RNAEAPath, the mutation strategies employed to 
evolve folding pathways can be categorized into three 
types: (1) rearranging the order of actions, (2) introdu- 
cing indirect pathways and (3) formation of a single 
stack or cooperative conversion of a pair of incompati- 
ble stacks. In this section, let Mi, . . . , My denote the 
mutation strategies and let p = a.\, . . . , a m denote the 
input pathway A = S 0 , . . . , S m = B. For each mutation 
strategy M y (p), we describe the process for generating 



one new pathway q using each mutation strategy when 
given p. 

Type 1: reordering of actions 

As described in the subsection of representation of RNA 
folding pathways, shuffling the order of actions of the 
input pathway p can result in a new pathway from A to 
B. In RNAEAPath, two mutation strategies of this type 
are employed. Mi changes the position of an arbitrary 
action, and M 2 swaps the positions of two arbitrary 
actions. 

Mi: Let Mj 1,t2 (p) denote the sequence of actions 
obtained by first removing an action &u (1 < t\ < m) from 
p and then inserting it after dt 2 , for all t 2 e {0,..., t\ - 1, 
t\ + 1, . . . , m}. Note that the resulting sequence of 
actions may not necessarily be a valid action chain. For 
instance, in Figure 1, Mj' 4 (p) = a 2 , a 3 , a 4 , ax, a 5 , . . . , a 8 
and Mj' 2 (p)=p are valid action chains, while 
Mj' x (p) = ai,a & ,a 2 , . . . ,a 7 is not. 

The procedure for computing Mj' t2 (p) is described in 
the following. 

1. Choose ti uniformly at random from the interval 
[1, m\. 

2. Compute the interval [/, u], {tx <l <u <m), where / 
is the minimum and u is the maximum such that 
for all t 2 e [/, u] and t 2 * t x , Mj' t2 (p) is a valid action 
chain. 

3. Choose t 2 from the interval [/, u]. 

3.1. If flti is an addition operation, for all / < t <t' 

< u and t * t' t= t\, the probability of choosing t 
is greater than that of t'. 

3.2. Otherwise (a deletion operation), for all / < t 

< t' < u and t * t' * t\, the probability of choos- 
ing t is less than that of t' 

We do not choose t 2 {t 2 * tx) uniformly at random in 
[/, u], instead, we tend to place addition operations in 
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the front part of p, and deletion operations in the later 
part of p. This is because adding base pairs early and 
deleting them late during the folding may help stabilize 
the intermediate secondary structures. (Please see Addi- 
tional file 1 for the detailed description of the discrete 
probability.) 

M 2 : Let M 2 1,t2 (p) denote the sequence of actions obtained 
by swapping «ti with «i 2 . If the resulting sequence of 
actions is a valid action chain, let it be q; otherwise, restart 
the process. For example, in Figure 1, Di^' 8 (p) i s not a va lid 
action chain, while tM^' 4 (p) = fli, a 4 , a 3 , a 2 , a 5 , ... ,a s is. t x 
and t 2 are chosen uniformly at random from {(tx, t 2 ): 1 < 
tx <t 2 < m}. 

Mutation strategies of type 1 provide methods for 
shuffling the order of actions of an input pathway and 
generating slightly different new pathways. However, 
these strategies are not capable of introducing additional 
(indirect) base pairs, and the offsprings of a direct path- 
way produced through type 1 strategies are also direct. 
In the following, we will describe mutation strategies 
that are able to construct indirect pathways from a 
direct pathway. 

Type 2: introducing indirect pathways by adding a pair of 
complementary actions 

Morgan and Higgs [16] pointed out that the optimal 
folding paths are generally indirect pathways. This idea 
was further described by Dotu et al. [28]. The tempor- 
ary formation of base pairs, especially those base pairs 
that do not belong to A U B, may lower the energies of 
intermediate structures and thus render better folding 
pathways. Similarly, temporary deletion and reformation 
of a base pair also can create an indirect pathway. 

Di 3 : Let Y\^' t2,+ ^''\p) denote the sequence of actions 
obtained by introducing an addition action add^ after a h 
and its complementary action del,-, after at 2 . Let 
M3 1,t2 ~' !J '(p) denote the sequence of actions obtained by 
introducing a deletion action del i>; after and its com- 
plementary action add iy after «t 2 . For example, in Figure 1, 
Y\l' 7 ' +{hl6 \p) = ai,add ljl6 , a 2 , a 7 , del 1>16 , a 8 . The pro- 
cedures for computing ^ ,t2 ' +(i, '\p) and M 3 1,t2 ~ (i ' ;) (p) are 
similar to each other. In the following, we only describe 
the procedure for computing f\ t ^' tl,+ ^ (py 

1. Choose t\ uniformly at random from the interval 
[1, m], and obtain the associated intermediate struc- 
ture S h , 

2. Find a set of base pairs that neither conflict with 
nor clash with S tl and choose a base pair (i, j) uni- 
formly at random from the set. 

3. Compute the interval [/, u], {tx <l <u <m), where / 
is the minimum and u is the maximum such that 
for all values t 2 e [/, u] the resulting sequence of 
actions of f\ t ^' tl,+ ^ [p) is a valid action chain. 



4. Choose t 2 from the interval [/, u] with the prob- 
ability of choosing t greater than that of t' for all 
t >{. (This is because (i, /') is not likely to be deleted 
soon after its formation.) 

Mutation strategy M 3 is capable of producing an indir- 
ect pathway from a direct pathway. In addition, a proper 
combination of multiple applications of M 3 may result in 
a pathway which simulates the successive formation and 
deletion of a temporary stack during the folding. Take 
the pathway p in Figure 1 as an example, we can con- 
struct a pathway q that forms a temporary stack consist- 
ing of all the GU base pairs via a multiple application of 
M 3 , q= M^ +(3 ' 14) (Mf' +(2 ' 15) (Mf' +(U6) (p))> 
Type 3: formation of a single stack or simultaneous 
formation and deletion of a pair of incompatible stacks 
In this section, we will introduce mutation strategies for 
producing pathways that involve with formation and 
deletion of stacks. To perform this type of strategies, we 
first need to find all possible stacks in an RNA sequence 
x. We use the algorithm of Bafna et al. [38] to find the 
set of all possible stacks with more than 3 consecutive 
base pairs, and denote it by STA(x). There are two stra- 
tegies in Type 3: formation of a single stack (M4) and 
simultaneous formation and destruction of a pair of 
incompatible stacks (M 5 ). 

M4: Let M^p) denote the sequence of actions 
obtained by forcing the formation of a stack stacks e 
STA after action a t , where stackh is compatible with S t . 
The following describes the procedure for computing 
Mf(p). 

1. Choose t uniformly at random from the interval 
[1, m], and obtain the associated intermediate struc- 
ture S t . 

2. Find a set of stacks that neither conflict with nor 
clash with S t , and pick up a stack stack h uniformly at 
random from the set. 

3. Ensure that each base pair (i, j) in {stack h - S t } is 
sequentially (from the innermost base pair to the 
outmost base pair) formed after a t . 

3.1. If an action addy appears in {« t +i> • • • > a m }> 
move it up and place it after a t using strategy Mi. 

3.2. Otherwise, introduce a pair of complemen- 
tary actions add;,; and del,-, to p after a t using 
strategy M 3 . 

We can introduce additional stacks that are compati- 
ble with S t using M 4 by forcing a sequence of addition 
actions successively forming base pairs in {stack h - S t ], 
after a t . 

M 5 : Let M^p) denote the sequence of actions obtained 
by forcing the formation of a stack stack h e STA which 
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is incompatible with S t , after action a t . Shown on the 
right side of Figure 4 is a folding pathway which simul- 
taneously destructs and forms a pair of incompatible 
stacks. Shown on the left side is a simple folding path- 
way which has exactly the same start and end structures, 
while it folds into a single stranded structure during the 
folding. Usually, the pathway on the right has lower 
energy barrier than the one on the left because it never 
folds into a single stranded structure. The folding path- 
way on the right side of Figure 4 can be introduced 
using strategy IM 5 . And, the procedure for computing 
V\ l ^{p) is as follows: 



1. Choose an arbitrary deletion action a t = dely from 
p, and obtain the associated intermediate structure S t . 

2. Find a set of stacks which either conflicts with or 
clashes with S t , and choose a stack stack h uniformly 
at random from the set. 

3. For each base pair (i', /") in {stacks - S t } that is 
compatible with S t , place add/y to p after a t using 
strategy M 4 . 

4. For each base pair (i', /") in {stack h - S t } that is 
incompatible with S t , 

4.1. Find all the base pairs (/*, /'*) in S t that are 
incompatible with (i" ', /), and ensure that each 
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Figure 4 Two different folding pathways with the identical initial and final secondary structures. Two different 
identical initial and final secondary structures. Left: Folding pathway 1 destroys a stack completely before an incompati 
Folding pathway 2 destructs a stack and forms an incompatible stack simultaneously. 



biding pathways with the 
ble stack is formed. Right: 
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base pair (z*, /'*) is deleted before the action 

add,-y>. 

4.2. If a action del;*j* appears in {a t +\, . . . , a m }, 
move it up before add,., y -using strategy Mi. 

4.3. Otherwise, introduce a pair of complemen- 
tary actions del,«,j« and add;* fusing strategy M 3 . 

Using M 5 , we can introduce the simultaneous forma- 
tion of a stack stacks, which is incompatible with S t , and 
destruction of existent stacks (or base pairs) that ham- 
per the formation of stack h . Since cooperative formation 
and destruction of stacks may contribute additional 
stacking energies for stabilizing the intermediate struc- 
tures, better folding pathways with lower energy barriers 
may be rendered. 

Results and discussion 

Benchmark tests 

We benchmarked RNAEAPath against existing methods 
(BARRIEERS [20,21], PathwayHunter [17], Find-path 
[24], and RNATabuPath [28]) by predicting low energy 
barrier folding pathways between two designated RNA 
secondary structures of 18 conformational switches. All 
the conformational switches were taken from the work of 
Dotu et. al [28]. Five of them are riboswitches, including 
rbl, rb2, rb3, rb4, and rb5. The metastable structures of 
these riboswitches have been experimentally determined 
by inline probing [9,39]. The thirteen remaining cases 



concern conformational switches, including hole, SL 
(Spliced leader RNA), sl5, s-box leader, thiM leader, 
ms2, HDV, dsrA, ribD leader, amv, alpha operon and 
HIV-1 leader. Sequences of these conformational 
switches can also be obtained from paRNAss web site 
[40], and some of the metastable secondary structures 
were computationally determined using RNAbor [41]. 

We summarize the results computed by PathwayHun- 
ter, the results computed by BARRIERS, the results com- 
puted by Findpath (with the look ahead parameter k = 
10), the best results over 1000 runs found by RNATabu- 
Path, and the best results over 1 run and 5 runs found by 
RNAEAPath in Table 1 respectively. And we use '-' to 
mark test cases that methods fail to apply to in the table. 
For all methods, free energies of the intermediate struc- 
tures of the folding pathways (including PathwayHunter) 
are evaluated based on the Turner model using RNAeval 
(with -dl option) from the Vienna RNA Package [25]. 
The default configuration parameters of RNAEAPath are 
as follows. MAX is 10, 7 is 5, £ is 100, € 1 is 10, £ 2 is 5 and 
£ 3 is 100. Due to the stochastic nature of the evolutionary 
algorithm, we report the best energy barrier of RNAEA- 
Path found over both 1 run and 5 runs. 

BARRIERS is the only exact solution that produces 
indirect pathways based on the Turner model. BAR- 
RIERS has already been compared with existing heuristic 
algorithms on the same test cases in the work of Dotu 
et al. [28]. We put the results of BARRIERS in the table 



Table 1 Benchmarks of BARRIERS, PathwayHunter, Findpath, RNATabuPath, and RNAEAPath for predicting folding 



pathways between conformational switches on the 1£ 


> test cases 








Instance 


BARRIERS PathwayHunter 


FindPath 


RNATabuPath 




RNAEAPath 








(n = 1000) 


(n= 1) 


(n = 5) 


rbl 




24.04 


24.04 


23.2 


22 


rb2 


10 


8.2 


7.25 


6.5 


6.5 


rb3 




22.4 


17.9 


17.5 


16.7 


rb4 




16.9 


16.9 


16.9 


16.9 


rb5 




24.54 


24.54 


21.44 


21.44 


hok 




28.5 


29.66 


20.7 


20.1 


SL 


11.80 


13 


12.9 


13.0 


12.9 


attenuator 


8.3 


8.7 


8.6 


8.7 


8.5 


si 5 


6.60 


7.1 


6.6 


7.1 


7.1 


sbox leader 


7.9 


5.2 


5.2 


5.2 


5.2 


thiM leader 




16.13 


14.84 


12.3 


12.3 


ms2 


11.6 


6.6 


6.6 


6.6 


6.6 


HDV 


23.53 


17.4 


17.0 


16.8 


16.8 


dsrA 


8.0 


8.3 


8.2 


8.0 


8.0 


ribD leader 




10.71 


9.5 


9.5 


9.5 


amv 


12.2 


5.8 


5.8 


5.74 


5.74 


alpha operon 


11.8 


6.5 


6.5 


6.1 


6.1 


HIV-1 leader 


14.3 


9.3 


11.3 


8.9 


8.9 



Benchmarks of BARRIERS, PathwayHunter, Findpath, RNATabuPath, and RNAEAPath for predicting folding pathways between conformational switches. Energy 
barriers (measured in kcal/mol) of the best folding pathways over n runs are shown. Boldface numbers are the best energy barriers found by the heuristic 
algorithms. 
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just for the sake of comparison. It has been pointed out 
that BARRIERS gives provably globally optimal pathways 
in 4 out of 18 cases (i.e. SL, attenuator, sl5 and dsrA). 
BARRIERS can not be directly applied to 5 cases 
because either the initial or the end structure is not 
locally optimal (i.e. rb2, sbox leader, ms2, amv and 
alpha operon), and can not converge in the remaining 
cases. Possibly due to the fact that both the number of 
RNA secondary conformations to consider and the com- 
putational resources required increase exponentially 
with the growing length of the RNA sequence and the 
growing range of energy barrier. PathwayHunter is an 
exact algorithm capable of producing the optimal direct 
folding pathways based on the Nussinov model. 
PathwayHunter can not be directly applied to 10 cases, 
because it requires the pair of input structures being 
able to form a 'pairwise-optimal' bipartite conflicting 
graph (see the work of Thachuk et al. [17] for details). 
It is not surprising that the performance of the exact 
algorithm, PathwayHunter, evaluated by free energy (in 
kcal/mol), is worse than the heuristic algorithms. This is 
because PathwayHunter is optimized based on the Nus- 
sinov model and only produces direct pathways, while 
the optimal direct pathways predicted based on the Nus- 
sinov model may not be the optimal pathways (consid- 
ering both direct and indirect pathways) based on the 



Turner model. All the remaining three methods are 
heuristics capable of producing both direct and indirect 
pathways based on the Turner model. Findpath pro- 
duces folding pathways very quickly, however it per- 
forms worse than both RNATabuPath and RNAEAPath 
in most cases. RNATabuPath performs better than Find- 
path, but produces less optimal pathways than RNAEA- 
Path. The energy barriers predicted by RNAEAPath over 
5 runs are exactly the same as RNATabuPath in 5 cases, 
worse in 1 case, and better in all the remaining 12 cases. 

Other heuristic algorithms (including a greedy algo- 
rithm of Voss et al. [26], a semi-greedy modification of 
the greedy algorithm, a greedy algorithm of Morgan, 
and Higgs [16] for predicting direct pathways and a var- 
iant of the Morgan-Higgs greedy algorithm capable of 
producing indirect pathways), that have been shown to 
perform considerably worse than RNATabuPath [28], 
are not listed. 

By analyzing the best folding pathways produced by 
RNAEAPath, we found that most high-quality pathways 
involve the melting of stacks in the initial structure, the 
(possibly simultaneous) construction of stacks in the final 
structure, and the formation of auxiliary temporary 
stacks for obtaining folding pathways with lower energy 
barriers. We may take the lowest energy barrier folding 
pathway of rb2 found by RNAEAPath, shown in Figure 5 
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Figure 5 The best folding pathway predicted for rb2. The near optimal indirect pathway between the two conformational secondary 
structures of the adenine riboswitch from V. vulnificus (rb2) predicted RNAEAPath. 
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as an example. The stack colored in red is an auxiliary 
temporary stack introducing intermediate structures with 
lower free energies (which is constructed using M4). 
Some of the stacks in the initial structure (in blue) are 
gradually melting, while at the same time, an incompati- 
ble stack (in green) is being formed (which is constructed 
using IM5). The stack colored in red is an auxiliary tem- 
porary stack introducing intermediate structures with 
lower free energies. This example convinces us that the 
advantages of RNAEAPath mainly come from employing 
mutation strategies that guide the construction of folding 
pathways by the formation and destruction of stacks and 
introducing additional stacking interactions that are 
important for stabilizing the intermediate structures. 



Detailed low energy barrier folding pathways for all the 
test cases are available on RNAEAPath web site. 

Control parameters and performance 

In order to evaluate the performance of RNAEAPath 
with different parameter configurations, we played with 
several other control parameters, including £ lt the num- 
ber of top offsprings preserved in the next generation, 
varying from 1 to 16, £ 3 , the size of population in each 
generation, varying from 80 to 120 and £, the total 
number of offsprings each individual is expected to pro- 
duce, varying from 80 to 120. The detailed results are 
shown in Additional file 1. In general, RNAEAPath pro- 
duces pathways of roughly the same quality for most 
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Figure 6 Performance of RNAEAPath in each generation. Energy barriers (in kcal/mol) of the best folding pathways of 18 conformational 
switches by the end of each generation during a typical run of RNAEAPath using the default parameters. 
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test cases with different control parameters, among 
which the default parameter setting is the best. 

We explored the relationship between the performance 
of RNAEAPath and the number of generations completed 
by plotting energy barriers of the best folding pathways 
produced by RNAEAPath with the default parameters in 
each generation, as shown in Figure 6. In general, the 
energy barriers decrease dramatically in the first one or 
two generations, and then the decrements slow down and 
finally plateau within 10 generations. For instance, in the 
case of rb3, the predicted energy barriers of folding path- 
ways in the initial population is 27.3 kcal/mol. It decreases 
by 7.2 kcal/mol (24.9%) through the first two generations 
and decreases by 2.5 kcal/mol (9.2%) through the next 
three generations. Through all the remaining generations, 
no further improvement is made. 

We also evaluated the execution time for each run of 
RNAEAPath. All the tests were performed on a 32 bit PC 
with 2.4 GHz Quad-processor and 3.2 GB memory, run- 
ning Fedora 11. With the default control parameters, 
RNAEAPath terminates in 1 minute in the best case (rb4), 
445 minutes in the worst case (hok), and 43 minutes on 
average. The detailed running times are shown in Addi- 
tional file 1. We did not perform direct comparisons 
between the running time of RNATabuPath and that of 
RNAEAPath, since RNATabuPath is only accessible via 
web server. 

Conclusions 

In conclusion, we have presented a new algorithm, 
RNAEAPath, for predicting low energy barrier folding 
pathways between conformational structures. RNAEAPath 
guides the construction of folding pathways through the 
destruction and formation of RNA stacks using various 
types of mutation strategies, and integrates them in a well- 
established computational framework of evolutionary algo- 
rithm. These mutation strategies can help reduce the 
search space and make it easier to jump out of local 
optima. By analyzing the results, we confirmed that most 
of the best folding pathways involve the formation of aux- 
iliary stacks, or involve the cooperative formation and dis- 
ruption of incompatible stacks. The benchmarking results 
show that RNAEAPath outperforms the existing heuristics 
on most test cases. We believe that this is because the 
construction of folding pathways in RNAEAPath captures 
important biological findings. 

Additional material 



Additional file 1: Supplementary data for predicting folding 
pathways between RNA conformational structures guided by RNA 
stacks. Supplementary data for predicting folding pathways between 
RNA conformational structures guided by RNA stacks in a PDF file. 
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